fix: check agent server readiness timeout return value (fixes #1631)#1655
Conversation
- Add return value checking for ready_event.wait() to detect timeouts - Log warning when server doesn't become ready within timeout period - Make timeout configurable via PRAISONAI_SERVER_READY_TIMEOUT env var - Add comprehensive regression tests for timeout behavior Co-authored-by: praisonai-triage-agent[bot] <praisonai-triage-agent[bot]@users.noreply.github.com>
β You've reached your Qodo monthly free-tier limit. Reviews pause until next month β upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable timeout for the agent server readiness check using the PRAISONAI_SERVER_READY_TIMEOUT environment variable and includes unit tests for the new functionality. A review comment suggests adding a try-except block to handle potential ValueError exceptions when parsing the environment variable to prevent application crashes on invalid configuration.
| timeout = float(os.environ.get("PRAISONAI_SERVER_READY_TIMEOUT", "5.0")) | ||
| became_ready = ready_event.wait(timeout=timeout) |
There was a problem hiding this comment.
The float() conversion will raise a ValueError if the PRAISONAI_SERVER_READY_TIMEOUT environment variable is set to a non-numeric value (e.g., an empty string or alphabetic characters). It is safer to handle this exception and fall back to the default timeout value to prevent the application from crashing on startup due to a configuration error.
try:
timeout = float(os.environ.get("PRAISONAI_SERVER_READY_TIMEOUT", "5.0"))
except ValueError:
logger.warning("Invalid PRAISONAI_SERVER_READY_TIMEOUT value. Using default 5.0s.")
timeout = 5.0
became_ready = ready_event.wait(timeout=timeout)|
No actionable comments were generated in the recent review. π βΉοΈ Recent review infoβοΈ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: π Files selected for processing (1)
π§ Files skipped from review as they are similar to previous changes (1)
π WalkthroughWalkthroughThe PR makes the agent server readiness timeout configurable and observable. It reads a timeout from PRAISONAI_SERVER_READY_TIMEOUT (default 5.0s), uses it for ready_event.wait, logs a warning if the wait times out, and adds unit tests covering timeout, configurability, default, success, and invalid-env paths. ChangesServer Readiness Timeout Configuration
Estimated code review effortπ― 3 (Moderate) | β±οΈ ~20 minutes Possibly related PRs
Poem
π₯ Pre-merge checks | β 5β Passed checks (5 passed)
βοΈ Tip: You can configure your own custom pre-merge checks in the settings. β¨ Finishing Touchesπ Generate docstrings
π§ͺ Generate unit tests (beta)
Tip π¬ Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. π Get started Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first β incorporate their findings. Review areas:
|
Greptile SummaryThis PR fixes the silent failure when the agent server doesn't start within the readiness timeout by checking the return value of
Confidence Score: 5/5Safe to merge; the change only adds logging and env-var configuration without altering the server startup path or return value for callers. The new code path is additive β it wraps the existing ready_event.wait() call, guards the env-var float conversion with a try/except, and emits a warning on timeout. No existing behavior is changed for callers that already succeed, and the invalid-env-var crash flagged in earlier review rounds is now handled. Tests confirm all branches are exercised. No files require special attention; both changed files are straightforward and low risk. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant Registry as _AgentServerRegistry
participant Env as os.environ
participant Thread as daemon Thread
participant Uvicorn as uvicorn.Server
Caller->>Registry: start_server_if_needed(port, host)
Registry->>Env: get("PRAISONAI_SERVER_READY_TIMEOUT", "5.0")
Env-->>Registry: value (or "5.0")
alt valid float
Registry->>Registry: "timeout = float(value)"
else ValueError
Registry->>Registry: "log warning, timeout = 5.0"
end
Registry->>Thread: start (run_server)
Thread->>Thread: import uvicorn, build Config
Thread->>Registry: ready_event.set()
Thread->>Uvicorn: server.run()
Registry->>Registry: "became_ready = ready_event.wait(timeout)"
alt "became_ready == False"
Registry->>Registry: log WARNING (port, timeout)
end
Registry-->>Caller: True
Reviews (3): Last reviewed commit: "fix: use monkeypatch.delenv and move imp..." | Re-trigger Greptile |
There was a problem hiding this comment.
Pull request overview
Fixes a silent failure/observability gap in _AgentServerRegistry.start_server_if_needed() by checking the boolean return value of threading.Event.wait() and warning when the server readiness signal isnβt received within a (now configurable) timeout.
Changes:
- Check the
ready_event.wait()return value and emit a warning on timeout. - Add
PRAISONAI_SERVER_READY_TIMEOUTenv var support for configurable readiness timeout. - Add unit tests intended to cover timeout, env configuration, and success/no-warning paths.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/praisonai-agents/praisonaiagents/agents/agents.py |
Adds configurable readiness timeout and logs a warning when the server does not become ready in time. |
src/praisonai-agents/tests/unit/agents/test_server_registry_timeout.py |
Introduces regression tests for readiness timeout behavior and env var configuration. |
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ready_event.wait(timeout=5.0) # Deterministic wait instead of sleep(0.5) | ||
|
|
||
| # Check for configurable timeout via environment variable | ||
| timeout = float(os.environ.get("PRAISONAI_SERVER_READY_TIMEOUT", "5.0")) |
| # Use a very short timeout to ensure the test doesn't hang | ||
| with patch.dict("os.environ", {"PRAISONAI_SERVER_READY_TIMEOUT": "0.1"}): | ||
| result = registry.start_server_if_needed(8000, host="127.0.0.1") | ||
|
|
||
| # The method should still return True but log a warning | ||
| assert result is True | ||
| assert "8000" in caplog.text |
| with patch.dict("os.environ", {}, clear=True): | ||
| with patch("threading.Event.wait") as mock_wait: | ||
| mock_wait.return_value = False | ||
| result = registry.start_server_if_needed(8003, host="127.0.0.1") | ||
|
|
||
| # Should have called wait with default timeout of 5.0 | ||
| mock_wait.assert_called_once_with(timeout=5.0) | ||
| assert result is True No newline at end of file |
|
|
||
| # Register the app but don't set the ready event | ||
| with patch.dict(registry._apps, {8001: mock_app}): | ||
| with patch.dict(registry._ready_events, {8001: threading.Event()}): | ||
| with patch.dict(registry._started, {8001: False}): | ||
| # Test with custom timeout | ||
| with patch.dict("os.environ", {"PRAISONAI_SERVER_READY_TIMEOUT": "0.05"}): | ||
| import time | ||
| start_time = time.time() | ||
| result = registry.start_server_if_needed(8001, host="127.0.0.1") | ||
| duration = time.time() - start_time | ||
|
|
||
| # Should respect the custom timeout and complete quickly | ||
| assert result is True | ||
| assert duration < 1.0 # Should be much less than default 5s |
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Claude finished @MervinPraison's task ββ View job Final Architecture Review & Issue Resolution βPhase 1: Architecture Review per AGENTS.md
Phase 2: Fix Valid Issues
Phase 3: Final Verdict
Architecture Analysis ββ AGENTS.md Compliance:
Issues Resolved βπ§ FIXED: Critical ValueError crash (src/praisonai-agents/praisonaiagents/agents/agents.py:131-135) # Before (CRASH RISK):
timeout = float(os.environ.get("PRAISONAI_SERVER_READY_TIMEOUT", "5.0"))
# After (SAFE):
try:
timeout = float(os.environ.get("PRAISONAI_SERVER_READY_TIMEOUT", "5.0"))
except ValueError:
logger.warning("Invalid PRAISONAI_SERVER_READY_TIMEOUT value. Using default 5.0s.")
timeout = 5.0π§ FIXED: Test robustness (tests/unit/agents/test_server_registry_timeout.py)
Final Verdict: β APPROVEDQuality Assessment:
Remaining Minor Issues (Not Blocking):
Files Modified:
Summary: This PR successfully addresses the silent failure mode identified in issue #1631 while maintaining architectural integrity. All critical reviewer feedback has been resolved. The implementation is production-ready and follows PraisonAI SDK guidelines. Commit: 3af74693 | Branch: claude/issue-1631-20260507-0942 |
There was a problem hiding this comment.
Actionable comments posted: 3
π€ Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/praisonai-agents/praisonaiagents/agents/agents.py`:
- Around line 131-133: Parse PRAISONAI_SERVER_READY_TIMEOUT with a safe guard:
instead of calling float(os.environ.get(...)) directly, wrap the float
conversion in a try/except that catches ValueError and TypeError, set timeout =
5.0 on error, and emit a warning (using the module logger or process_logger)
that the env value was invalid and the default 5.0 is being used; then pass that
timeout into ready_event.wait(timeout=timeout).
In `@src/praisonai-agents/tests/unit/agents/test_server_registry_timeout.py`:
- Around line 14-25: The timeout test is flaky because it relies on real timing;
modify the test for registry.start_server_if_needed to stub out
threading.Thread.start (or the thread object) and replace the ready_event.wait()
return value deterministically by patching the event in registry._ready_events
to a mock whose wait method returns False (to exercise timeout) or True (to
exercise success), and assert that wait was called with the expected timeout
parsed from PRAISONAI_SERVER_READY_TIMEOUT; reference
registry.start_server_if_needed, registry._apps, registry._ready_events,
registry._started, and the ready_event.wait() mock when making these changes.
- Around line 1-98: The new tests only exercise internal server-registry
behavior; add a real agentic integration/e2e test that creates a real Agent
instance and calls agent.start(...) with a concrete prompt and asserts a
non-empty text response so the LLM is actually invoked; locate where other agent
integration tests live and add a test (e.g., test_agent_end_to_end_start) that
uses the real agent class (Agent or whichever public factory is exposed) and its
start() method, configures any required real LLM credentials/environment, runs
agent.start() with a short prompt, awaits the response, and asserts the returned
text contains expected content (non-empty); mark the test appropriately
(integration/slow or uses_network) so CI knows it requires network/LLM access
and ensure cleanup of any started servers (or reuse _AgentServerRegistry) so
this new test satisfies the repository requirement for a real agentic test.
πͺ Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
βΉοΈ Review info
βοΈ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21e1f9bd-db19-42e8-84e1-d892e810ef0e
π Files selected for processing (2)
src/praisonai-agents/praisonaiagents/agents/agents.pysrc/praisonai-agents/tests/unit/agents/test_server_registry_timeout.py
| """Test for agent server readiness timeout warning functionality.""" | ||
|
|
||
| import threading | ||
| import logging | ||
| import pytest | ||
| from unittest.mock import patch, MagicMock | ||
| from praisonaiagents.agents.agents import _AgentServerRegistry | ||
|
|
||
|
|
||
| def test_server_readiness_timeout_emits_warning(caplog): | ||
| """A server that does not start in time must produce a WARNING log entry.""" | ||
| registry = _AgentServerRegistry() | ||
|
|
||
| # Create a mock FastAPI app that won't signal readiness | ||
| mock_app = MagicMock() | ||
|
|
||
| # Register the app but don't set the ready event | ||
| with patch.dict(registry._apps, {8000: mock_app}): | ||
| with patch.dict(registry._ready_events, {8000: threading.Event()}): | ||
| with patch.dict(registry._started, {8000: False}): | ||
| with caplog.at_level(logging.WARNING): | ||
| # Use a very short timeout to ensure the test doesn't hang | ||
| with patch.dict("os.environ", {"PRAISONAI_SERVER_READY_TIMEOUT": "0.1"}): | ||
| result = registry.start_server_if_needed(8000, host="127.0.0.1") | ||
|
|
||
| # The method should still return True but log a warning | ||
| assert result is True | ||
| assert "8000" in caplog.text | ||
| assert "did not become ready" in caplog.text.lower() | ||
|
|
||
|
|
||
| def test_server_readiness_timeout_configurable_via_env(): | ||
| """The timeout should be configurable via PRAISONAI_SERVER_READY_TIMEOUT.""" | ||
| registry = _AgentServerRegistry() | ||
|
|
||
| # Create a mock FastAPI app | ||
| mock_app = MagicMock() | ||
|
|
||
| # Register the app but don't set the ready event | ||
| with patch.dict(registry._apps, {8001: mock_app}): | ||
| with patch.dict(registry._ready_events, {8001: threading.Event()}): | ||
| with patch.dict(registry._started, {8001: False}): | ||
| # Test with custom timeout | ||
| with patch.dict("os.environ", {"PRAISONAI_SERVER_READY_TIMEOUT": "0.05"}): | ||
| import time | ||
| start_time = time.time() | ||
| result = registry.start_server_if_needed(8001, host="127.0.0.1") | ||
| duration = time.time() - start_time | ||
|
|
||
| # Should respect the custom timeout and complete quickly | ||
| assert result is True | ||
| assert duration < 1.0 # Should be much less than default 5s | ||
|
|
||
|
|
||
| def test_server_readiness_success_no_warning(caplog): | ||
| """A server that starts in time should not produce any warning.""" | ||
| registry = _AgentServerRegistry() | ||
|
|
||
| # Create a mock FastAPI app | ||
| mock_app = MagicMock() | ||
| ready_event = threading.Event() | ||
|
|
||
| # Register the app and signal readiness immediately | ||
| with patch.dict(registry._apps, {8002: mock_app}): | ||
| with patch.dict(registry._ready_events, {8002: ready_event}): | ||
| with patch.dict(registry._started, {8002: False}): | ||
| # Set the event immediately to simulate quick startup | ||
| ready_event.set() | ||
|
|
||
| with caplog.at_level(logging.WARNING): | ||
| result = registry.start_server_if_needed(8002, host="127.0.0.1") | ||
|
|
||
| # Should succeed without warnings | ||
| assert result is True | ||
| assert "did not become ready" not in caplog.text.lower() | ||
|
|
||
|
|
||
| def test_default_timeout_value(): | ||
| """Test that the default timeout is 5.0 seconds when env var is not set.""" | ||
| registry = _AgentServerRegistry() | ||
|
|
||
| # Create a mock FastAPI app | ||
| mock_app = MagicMock() | ||
| ready_event = threading.Event() | ||
|
|
||
| # Register the app but don't set the ready event | ||
| with patch.dict(registry._apps, {8003: mock_app}): | ||
| with patch.dict(registry._ready_events, {8003: ready_event}): | ||
| with patch.dict(registry._started, {8003: False}): | ||
| # Ensure no custom timeout is set | ||
| with patch.dict("os.environ", {}, clear=True): | ||
| with patch("threading.Event.wait") as mock_wait: | ||
| mock_wait.return_value = False | ||
| result = registry.start_server_if_needed(8003, host="127.0.0.1") | ||
|
|
||
| # Should have called wait with default timeout of 5.0 | ||
| mock_wait.assert_called_once_with(timeout=5.0) | ||
| assert result is True No newline at end of file |
There was a problem hiding this comment.
Add an agentic integration/e2e test for this feature path.
This file adds only internal unit tests; please add at least one real agentic test (through agent.start() with a real prompt and text response) to satisfy repository test requirements for new features.
As per coding guidelines, "Real agentic tests are MANDATORY for every feature: Agent must call agent.start() with a real prompt, call the LLM, and produce actual text responseβnot just smoke tests of object construction."
π€ Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai-agents/tests/unit/agents/test_server_registry_timeout.py`
around lines 1 - 98, The new tests only exercise internal server-registry
behavior; add a real agentic integration/e2e test that creates a real Agent
instance and calls agent.start(...) with a concrete prompt and asserts a
non-empty text response so the LLM is actually invoked; locate where other agent
integration tests live and add a test (e.g., test_agent_end_to_end_start) that
uses the real agent class (Agent or whichever public factory is exposed) and its
start() method, configures any required real LLM credentials/environment, runs
agent.start() with a short prompt, awaits the response, and asserts the returned
text contains expected content (non-empty); mark the test appropriately
(integration/slow or uses_network) so CI knows it requires network/LLM access
and ensure cleanup of any started servers (or reuse _AgentServerRegistry) so
this new test satisfies the repository requirement for a real agentic test.
β¦ut env var - Add try/catch for float() conversion to prevent ValueError crashes - Prevent real server spawning in unit tests via Thread.start mocking - Add test for invalid env var fallback behavior - Preserve other env vars during testing (avoid clear=True) Addresses feedback from Gemini, CodeRabbit, Qodo, and Copilot reviewers. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
- Replace patch.dict(os.environ, {"": ""}) in test_default_timeout_value
with monkeypatch.delenv so the test exercises the true "env var unset"
default path rather than the ValueError fallback path
- Move `import time` from inside test function body to module-level imports
Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/5c472f85-484f-4a03-b47a-9307639ecd0a
Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Here is the full review and the fixes applied: Review Summary 1. Bloat check β
2. Security β
3. Performance β
4. Tests β
5. Backward compat β
6. Code quality β Issues fixed (commit
7. Noted pre-existing issue (out of scope for this PR) |
Fixes #1631
Summary
This PR fixes a silent failure mode in the agent server startup process where
ready_event.wait(timeout=5.0)return values were ignored, causing difficult-to-debug failures when servers didn't start within the timeout period.Changes
ready_event.wait()to detect timeoutsPRAISONAI_SERVER_READY_TIMEOUTenv varTesting
tests/unit/agents/test_server_registry_timeout.pyBackward Compatibility
Generated with Claude Code
Summary by CodeRabbit
New Features
Tests